AIDmut-Seq: a Three-Step Method for Detecting Protein-DNA Binding Specificity

ABSTRACT Transcriptional factors (TFs) and their regulons make up the gene regulatory networks. Here, we developed a method based on TF-directed activation-induced cytidine deaminase (AID) mutagenesis in combination with genome sequencing, called AIDmut-Seq, to detect TF targets on the genome. AIDmut-Seq involves only three simple steps, including the expression of the AID-TF fusion protein, whole-genome sequencing, and single nucleotide polymorphism (SNP) profiling, making it easy for junior and interdisciplinary researchers to use. Using AIDmut-Seq for the major quorum sensing regulator LasR in Pseudomonas aeruginosa, we confirmed that a few TF-guided C-T (or G-A) conversions occurred near their binding boxes on the genome, and a number of previously characterized and uncharacterized LasR-binding sites were detected. Further verification of AIDmut-Seq using various transcriptional regulators demonstrated its high efficiency for most transcriptional activators (FleQ, ErdR, GacA, ExsA). We confirmed the binding of LasR, FleQ, and ErdR to 100%, 50%, and 86% of their newly identified promoters by using in vitro protein-DNA binding assay. And real-time RT-PCR data validated the intracellular activity of these TFs to regulate the transcription of those newly found target promoters. However, AIDmut-Seq exhibited low efficiency for some small transcriptional repressors such as RsaL and AmrZ, with possible reasons involving fusion-induced TF dysfunction as well as low transcription rates of target promoters. Although there are false-positive and false-negative results in the AIDmut-Seq data, preliminary results have demonstrated the value of AIDmut-Seq to act as a complementary tool for existing methods. IMPORTANCE Protein-DNA interactions (PDI) play a central role in gene regulatory networks (GRNs). However, current techniques for studying genome-wide PDI usually involve complex experimental procedures, which prevent their broad use by scientific researchers. In this study, we provide a in vivo method called AIDmut-Seq. AIDmut-Seq involves only three simple steps that are easy to operate for researchers with basic skills in molecular biology. The efficiency of AIDmut-Seq was tested and confirmed using multiple transcription factors in Pseudomonas aeruginosa. Although there are still some defects regarding false-positive and false-negative results, AIDmut-Seq will be a good choice in the early stage of PDI study.

expression level, AID variant, linker variant, fusion direction etc. In the second part of the work, the method was used to identify binding sites of other P. aeruginosa TFs. In particular, three activators (ExsA, GacA, ErdR) and three repressors (RsaL, GntR, HmgR) were taken into consideration.
The overall text is too concise, to the detriment of clarity. In fact, several passages are not clear (only some examples are given in the specific comments below). Spectrum research articles have not words or references limit; hence things can be be better explained. The underlying rationale of the work is good, and the results obtained with LasR as model TR are convincing enough. The second part of the work is weak (see specific comments below). Finally, there are a lot of minor comments, only some of these are reported below. 1) lines 1-63 -the basic principles underlying each cited method (e.g. Chip-seq; DAP-seq; SELEX; DamID; Calling cards) should be concisely but clearly explained. The differences between in vitro and in vivo methods should be highlighted. Authors state that the proposed method is feasible also for "junior investigators" and "interdisciplinary researchers"; hence, also a researcher not very expert in molecular biology should be able to understand and appreciate advantages and disadvantages of each method in comparison with the proposed one. 2) Experiment of figure 2, Authors should indicate which arabinose concentration was used and AID∆ expression level should be determined. Lines 118-120 and Lines 163-170, these paragraphs are not clear enough, please reformulate/explain better. 3) Experiment of figure 3; Figure 3L, newly identified LasR binding sites should be validated also in vivo by using transcriptional fusions and/or Real Time RT-PCR. This panel should be a separated figure. 4) Results obtained with the three repressor proteins (RsaL, GntR, HmgR) clearly indicate that this method is not appropriate for the detection of DNA binding sites of repressors (e.g. see the summary in figure 4C). Authors do not highlight properly this important result. As also stated by , the AID enzyme works on single strand DNA that is originated during the formation of the transcription open complex. Hence, it makes sense that the method cannot work well for transcriptional repressors. This should be clearly discussed. 5) Among these repressors, RsaL is perhaps the most well-known and several papers with EMSA assays have been published. Authors should cite the papers produced between 2005 and 2009 by Giordano Rampioni et al. 6) Results obtained with the activator and dual TRs are not fully convincing. In particular, control EMSA assays have been provided only for the dual TR FleQ ( Figure S3). EMSA should be provided also for ExsA, GacA, ErdR, AmrZ. In addition, transcriptional fusion experiments or RT-PCR experiments should be carried out to validate in vivo the EMSA results. Concerning FleQ, only half of the tested promoters showed a clear band-shift after binding of FleQ (PA2393, PA2619, PA2653, PA2955, PA4981). I understand that this is a lot of work, perhaps authors could limit their work to FleQ and another transcriptional activator.
Minor comments (partial revision, only up to line 137). Line 64, please explain Ugi gene function and why this is important; Lines 64-68-not very clear, please reformulate; Line 71, AID from which organism?; Line 73, what do you mean for toxic? This is not clear enough, please reformulate, explain better; Lines 75-76, What is exactly the "dCas9-guided MS2-AID" mutation generator? It is likely that many readers could not know what the authors are talking about; Line 76, please use instead of "will" use "could"; Line 85, please use "could cost" instead of "costs"; Line 88, what is exactly AID∆? What is the difference compared to the wild type AID? Line 95, The LasR inducer is named N-3-oxo-dodecanoyl-homoserine lactone and should be abbreviated as 3OC12-HSL; Line 105, please add reference for the tested promoters; Line 137, not very clear, please clarify. Perhaps authors wants to say that "high expression levels of AID732-TF may lead to the detection of a large number of binding site with weak binding affinity".
Reviewer #2 (Comments for the Author): In this manuscript, the authors developed a new method, called AIDmut-seq, to identify binding sites of transcriptional factor in vivo. The AIDmut-seq method is performed by three steps composing with fusion protein, extraction of genomic DNA and sequencing/SNP profiling. This approach is easy to be employed by junior and interdisciplinary investigators with only basic understanding in molecular biology. To establish the AIDmut-seq platform, the authors optimized the fusion direction, linker type, AID variant, induction time and arabinose concentrations. After which, the AIDmut-seq was conducted to examine binding sites of quorum sensing regulator LasR. Sequencing depths and reproducibility of SNPs in independent experiments were compared to evaluate the repeatability AIDmut-seq. Further, AIDmut-seq was applied to other transcriptional factors in P. aeruginosa such as FleQ, AmrZ, ExsA and etc.. Subsequently, EMSA was conducted to validate detected target sequences. Finally, the authors compared AIDmut-seq method with the classic Chip-seq method, and showed that AIDmut-seq has several advantage compared to Chip-seq. In summary, this study developed a useful approach for detecting protein-DNA binding sites which might support research field about transcriptional regulatory network. I think this manuscript should be properly revised before its acceptance. Minor comments: 1) Line 20 and 35, "in vivo" should be written in italic. 2) Line 64, SNPs full name should be shown at first time.
3) Fig. 3A-3L should be reordered according to the order they appear in the text, as well as Fig. 4A-4C. 4) Manufacturer and affiliated states of reagents and kits in Methods should be provided. 5) Line 359, " Pseudomonas" should be written in italic. 6) A large number of species names are not italicized, such as Pseudomonas aeruginosa at line 491 in References. The format of some references is wrong. For example, the first letter of each word of paper title are capitalization. The authors need revise. 7) Strains, plasmids and primers used in this study are listed in supplemental material. 8) Line 25-26 and line 40 in supplemental material, "P. aeruginosa" and "Escherichia Coli" are revised in italic. 9) Line 40 in supplemental material, pET28a instead of pet28a. 10) Line 172 in supplemental material, LasR instead of lasR.

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex. Go to Author Tasks and click the appropriate manuscript title to begin the revision process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary. Here are a few examples of required updates that authors must address: • Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER. • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file. • Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file. For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/Spectrum/submission-review-process. Submissions of a paper that does not conform to Microbiology Spectrum guidelines will delay acceptance of your manuscript. " Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me. If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by Microbiology Spectrum.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publication Fees, including supplemental material costs, please visit our website.
Corresponding authors may join or renew ASM membership to obtain discounts on publication fees. Need to upgrade your membership level? Please contact Customer Service at Service@asmusa.org.
Thank you for submitting your paper to Microbiology Spectrum.
Re: Spectrum03783-22 (AIDmut-seq: A three-step method for detecting protein-DNA binding specificity) Thank you for submitting your manuscript to Microbiology Spectrum. When submitting the revised version of your paper, please provide (1) point-by-point responses to the issues raised by the reviewers as file type "Response to Reviewers," not in your cover letter, and (2) a PDF file that indicates the changes from the original submission (by highlighting or underlining the changes) as file type "Marked Up Manuscript -For Review Only". Please use this link to submit your revised manuscript -we strongly recommend that you submit your paper within the next 60 days or reach out to me. Detailed instructions on submitting your revised paper are below.
ASM policy requires that data be available to the public upon online posting of the article, so please verify all links to sequence records, if present, and make sure that each number retrieves the full record of the data. If a new accession number is not linked or a link is broken, provide production staff with the correct URL for the record. If the accession numbers for new data are not publicly accessible before the expected online posting of the article, publication of your article may be delayed; please contact the ASM production staff immediately with the expected release date.
The ASM Journals program strives for constant improvement in our submission and publication process. Please tell us how we can improve your experience by taking this quick Author Survey. l Reviewer comments:

Reviewer #1 (Comments for the Author):
This study describes the development and validation of a new method for the identification of transcriptional factors (TF) binding sites in bacterial genomes. Pseudomonas aeruginosa was used as model organism. The method, called AIDmut-seq, is based on the arabinose-dependent expression of the TF fused to the activation-induced cytidine deaminase (AID) enzyme. The two proteins are divided by a flexible linker. During bacterial growth in the presence of arabinose, AID should make C-T or G-A conversions in the DNA sequence near each TF binding site in the target bacterial genome. The target genome should be deleted in the specific TF gene and, most importantly, also in the ung gene (coding for uracil-N-glycosilase), in order to impair the repair of AID-induced mutations. The latter are ultimately detectable by whole genome sequencing. In the first part of the work, the LuxR-like activator LasR was used as model TF to set-up the method and define fusion protein expression level, AID variant, linker variant, fusion direction etc. In the second part of the work, the method was used to identify binding sites of other P. aeruginosa TFs. In particular, three activators (ExsA, GacA, ErdR) and three repressors (RsaL, GntR, HmgR) were taken into consideration.
The overall text is too concise, to the detriment of clarity. In fact, several passages are not clear (only some examples are given in the specific comments below). Spectrum research articles have not words or references limit; hence things can be better explained. The underlying rationale of the work is good, and the results obtained with LasR as model TR are convincing enough. The second part of the work is weak (see specific comments below). Finally, there are a lot of minor comments, only some of these are reported below. Reply: We thank the reviewer for giving these important and helpful comments. We have revised the manuscript according to the reviewer's comments, including rewriting the abstract, introduction, and results section and supplementing several real time RT-PCR results in the main text. The second part of the work was reorganized. The major and minor comments were addressed point by point. We look forward to further suggestions from the reviewer. 1) lines 1-63 -the basic principles underlying each cited method (e.g. Chip-seq; DAP-seq; SELEX; DamID; Calling cards) should be concisely but clearly explained. The differences between in vitro and in vivo methods should be highlighted. Authors state that the proposed method is feasible also for "junior investigators" and "interdisciplinary researchers"; hence, also a researcher not very expert in molecular biology should be able to understand and appreciate advantages and disadvantages of each method in comparison with the proposed one. Reply: Thanks for the comment. We have added detail descriptions of all the cited methods in the introduction section in our revised manuscript (page 3-7, line 57-67, line 79-96). In addition, we have added descriptions about the difference between in vivo and in vitro methods (page 4-5, line 71-78).
2) Experiment of figure 2, Authors should indicate which arabinose concentration was used and AID∆ expression level should be determined. Lines 118-120 and Lines 163-170, these paragraphs are not clear enough, please reformulate/explain better. Reply: Thanks for the comment. The arabinose concentration we used is 0.4%, corresponding to an intracellular AID∆ expression of 7~8 μM, as determined by quantification of SfGFP expression at the same arabinose concentration under microscope. We have added the information in the revised manuscript (page 8, line 148). In addition, we have reformulated the descriptions for line 118-120 and line 163-170 as below: Line 118-120 was rewritten as (page 9, line 170-181): "Besides, mutations generated in the promoter or coding sequences of a gene can affect its expression level or result in a loss of gene function, both of which may reduce the growth rate of a cell, leading to a decreased proportion of the mutated cells in the whole population after multiple generations. This will reduce the detected mutation frequencies of some mutations. To minimize the impact of possible growth rate reduction on the detection of genomic mutations, we added the inducers at an initial of 1.0 and diluted bacterial culture 5× for each 12 hours of shaking. Using this approach, the bacteria can take several generations to complete the C-T conversions induced by AIDΔ, while the population will not experience too many generations, which will eliminate those mutants with low growth rates. We observed the highest mutation frequency in all promoters with a culture time of 24 hours (two rounds of 12-hour culture)." Line 163-170 was rewritten as (page 11-12, line 220-231): "To eliminate these false-positive SNPs, we extracted the shared mutations of the AID732LasR_b1 genome in three independent experiments. According to our sequencing results of AID732pJN_b1, the average probability of stochastic mutation ( ) generated by AID732 through the experimental procedure was less than 10 per base pair. We assume that AID732-LasR has a similar probability to generate stochastic mutation. Thus, the probability of one mutated base pair that occurs in both three independent experiments is . Then the average number of shared mutations from three independent experiments was • . Here is the total number of base pairs of the genome which is within the range of 10 ~10 for common bacterial species. Therefore, • is far less than 1.0. That is, less than one shared stochastic mutation can be detected from three independent experiments. Thus, those stochastic false-positive results can be eliminated." 3) Experiment of figure 3; Figure 3L, newly identified LasR binding sites should be validated also in vivo by using transcriptional fusions and/or Real Time RT-PCR. This panel should be a separated figure. Reply: Thanks for the comment. We have moved Figure 3L  ns, non-significant; *, p<0.05; **, p<0.01; ***, p<0.001.

4)
Results obtained with the three repressor proteins (RsaL, GntR, HmgR) clearly indicate that this method is not appropriate for the detection of DNA binding sites of repressors (e.g. see the summary in figure 4C). Authors do not highlight properly this important result. As also stated by , the AID enzyme works on single strand DNA that is originated during the formation of the transcription open complex. Hence, it makes sense that the method cannot work well for transcriptional repressors. This should be clearly discussed. Reply: Thanks for the comment. We agree with the reviewer that AIDmut-seq has some defects when applying to transcriptional repressors, as exemplified by the results of RsaL and AmrZ. For GntR and HmgR, we identified two and one of their known targets on the P. aeruginosa genome, which represent all of their known genomic targets. Hence, it is not certain whether there are additional targets of GntR and HmgR that were missed by AIDmut-Seq. In fact, we are currently considering that there should be a proper window of the intracellular expression level of AID732-TF fusion proteins. In this window, the chimeric TFs are sufficient to bind their genomic targets while not overexpressed to disable the transcription of target promoters. As suggested by the reviewer, we have collected AIDmut-seq results for transcriptional repressors in a separate part of the results section (and in Figure 6), and we discussed the possible reasons that cause failure of AIDmut-seq in identifying RsaL and AmrZ targets (page 19, line 382-391). 5) Among these repressors, RsaL is perhaps the most well-known and several papers with EMSA assays have been published. Authors should cite the papers produced between 2005 and 2009 by Giordano Rampioni et al. Reply: Thanks for this kind suggestion. We have cited these papers in the revised manuscript (page 17-18, line 357-361). 6) Results obtained with the activator and dual TRs are not fully convincing. In particular, control EMSA assays have been provided only for the dual TR FleQ ( Figure S3). EMSA should be provided also for ExsA, GacA, ErdR, AmrZ. In addition, transcriptional fusion experiments or RT-PCR experiments should be carried out to validate in vivo the EMSA results. Concerning FleQ, only half of the tested promoters showed a clear band-shift after binding of FleQ (PA2393, PA2619, PA2653, PA2955, PA4981). I understand that this is a lot of work, perhaps authors could limit their work to FleQ and another transcriptional activator. Reply: Thanks for the critical comment. We agree with the reviewer that AIDmut-seq also identifies some false-positive results, as exemplified by several newly found FleQ targets (fdx1 and fimW). And in vivo RT-PCR experiments should be conducted. In the revised manuscript, we focused on the two transcriptional regulators FleQ and ErdR in a separate part of the results section. The SNP spectrums from AIDmut-seq, the in vitro EMSA results, and the in vivo real-time RT-PCR data for FleQ and ErdR targeted promoters were collected together in a new figure ( Figure 5).